-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update to processing of external data about vaccines #1104
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me overall. I didn't try running the code. You could add the .json file to the commit so we can inspect it before merging.
Are you concerned the that the conda environment may not build now that you added new packages? We could wait to see what happens in GitHub Actions or run a local dry run to check whether the versions can all be resolved.
owiddata/generate-owiddata-stats.py
Outdated
# If they add a new category (which seems unlikely), handle & throw error | ||
if vaxtype not in types.keys(): | ||
print("Unknown vaccine platform:", vaxtype) | ||
return vaxtype |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want this to return or exit with an error? Exiting is extreme, but we don't check the log files for error messages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this should ever happen because I think it handles all the major categories of vaccines that currently exist? I just wrote it in as a just-in-case. So I think either would be fine!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case, I prefer we exit with an error. That should happen rarely if ever, so we would want to know about the new major vaccine category.
owiddata/generate-owiddata-stats.py
Outdated
vaccines = vaccine_df[vaccine_df["Category"] == category] | ||
numTypes = len(set(vaccines["Type"].to_list())) | ||
if numTypes > 1: | ||
return vaccines[["Company", "Type"]].to_markdown() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how the JSON will store the markdown. We can see what happens and adjust if needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious if this will work!!! In theory I think the JSON looked ok but there are some special characters that could make things interesting
This is looking good. Once the code is stable, could you please run it locally and commit the output images and json file so we can inspect them before merging? |
Thank you @agitter for the feedback! This runs in conda and I believe all the components are here now! I will assume we want to keep the standardized color-coding across all figures, but I'm happy to change this if the shading is too faint for some of the newer methods. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I think this looks ready to merge.
@agitter I think this aligns the datasets well now. I did write in a fairly bulky output so that we can investigate things that seem weird (above). It prints all of the info for vaccines in the VIPER dataset that aren't found in the OWID data. Logically that is the only direction we need (since VIPER tracks candidates and OWID only tracks shots in arms) but I'm also happy to add in another check if we'd feel better checking both directions! |
I trust your judgement about how to best confirm the matching across data sources worked well. This seems good enough to me. The country counts in the json output have been restored to their expected numbers. |
@agitter thank you so much for all of the help with getting this to work! It turned out way more complicated than I initially thought. I'll merge now, fingers crossed it integrates well. |
I manually triggered the external resources build from here so that the figure URLs in the json file get updated with actual commits. That will also let us confirm everything runs in the updated environment. |
The workflow ran successfully, but it took 5 hours (!) for the conda environment to install. I'm not sure what would cause it to be so slow. We could switch to mamba at some point. |
@agitter Oh my gosh, five hours??? When I generate my conda environment.yml file locally, there are a lot of other dependencies, so it does seem like a complex environment. Sorry if these changes are slowing things down!!!! |
Description of the proposed additions or changes
This PR updates the OWID external analysis to include data about each vaccine scraped from https://covid19.trackvaccines.org/
This will allow us to avoid needing to manually update the types of each vaccine
Additionally, I added in the code needed to store the URL associated with each vaccine in case we want to pull other data from that site down the road. (Since most of the data is on a separate page where the url contains an arbitrary number assigned to each vaccine)
I also added some new outputs to the json file, including markdown tables listing approved vaccines in each category
The map file names for the inactivated whole virus vaccines has changed. The incorrect reference is removed in #1105
One other concern: I manually added the packages I used to environment.yml. Checking the environment in conda generates a huge list because it's the project as a whole. I think it will be clear whether this worked based on whether it works OK, but I would guess there's likely a better way to do this!
Related issues
closes #1101
Suggested reviewers (optional)
Checklist